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Abstract 


This work aimed to study the modeling of the organic pollution of the waters 
of the Déganobo Lake system by three models: Multiple Linear Regression 
model (MLR model), Mutilayer Perceptron model (MLP model) and Multiple 
Linear Regression/ Mutilayer Perceptron hybrid model (MLR/MLP hybrid 
model). In its implementation, the chemical oxygen demand (COD) of these 
waters, obtained from August 2021 to July 2022, was used. Two approaches 
were done in the case of the modeling of their COD by the MLP model and 
the MLR/MLP hybrid model: static modeling and dynamic modeling. The 
results have highlighted the low predictions of the COD of these waters by the 
MLR model (36.2 %) and the MLP models (6-8-1 for the static modeling and 
7-3-1 for the dynamic modeling, both predicting less than 35% of the 
experimental values with high error (RMSE upper than 1.30 and relative error 
upper than 0.750). However, the MLR/MLP hybrid models (MLR/6-3-1 for 
the static modeling and MLR/7-3-1 for the dynamic modeling) both well 
predicted the COD of these waters, around 99% with very low errors (RMSE 
less than 0.0001 and relative error less than 0.006 in both cases). So, the 
MLR/MLP hybrid model was the most efficient to predict the COD of these 
waters. The accuracy of this hybrid model for ecological modeling was again 
provided during this study. 


This work is licensed under the Creative Commons Attribution-Non- 
Commercial 4.0 International License. 


Introduction 


Organic matter has a fundamental role in the aquatic 
environment for its importance in biogeochemical 
reactions. However, its strong presence in surface 
waters contributes to ecological scourges and the 
consequences are serious health risks [1, 2]. In 
general, excessive organic matter in aquatic 
ecosystems generates an important quantity of 
nitrogenous and phosphorus nutrients, the main 
cause of eutrophication [3]. Excessive organic 
matter in these entities could be also an additional 
source of their metal pollution under some 
biogeochemical and physical conditions [1]. The 
presence of non-biodegradable organic matter in the 
surface waters, such as hydrocarbons and pesticides, 
could also lead to serious ecological and health risks 
[4, 5]. The knowledge of organic matter in waters, 
particularly in the surface waters, has always been 
important for the assessment of their quality. The 
assessment of the organic pollution of waters is 
conducted through several parameters of which the 
most used are the chemical oxygen demand (COD) 
and biochemical oxygen demand (BOD). COD and 
BOD represent approximate measurements of 
required oxygen quantities for chemical and 
biochemical degradation of organic matter, 
respectively. By their experimental implementation, 
BOD underestimates organic pollution, while COD 
extrapolates it [6, 7]. Therefore, the use of COD for 
the modeling of organic pollution would be more 
advantageous; in as much that is the representation 
of facts and the theoretical approach from statistics 
based on their spatial and temporal evolutions [8, 9]. 
One of the commonly used short-term and long-term 
methods for ecological monitoring is modeling. 
Artificial neuron networks (ANNs), the black 
box models, are becoming more and more 
commonly used in the development of prediction 
models for complex systems as the theory behind 
them develops and the processing power of 
computers increases [10, 11]. This is the particular 
case of the modeling of the COD of the surface 
waters [12, 13]. The multilayer perceptron (MLP), 
one of the multiple variants of ANNs, is suitable for 
this purpose as highlighted by many recent studies, 
such as those of Ay and Kisi [14], Bachir et al. [15] 
and Selim et al. [16]. The ability of the multiple 
linear regression model (MLR model) for the 
modeling of the COD of the surface waters was also 
provided by many recent studies [16, 17]. One 
model, becoming more and more used for ecological 
modeling, is the multiple linear regression- 
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multilayer perceptron hybrid model (MLR-MLP 
hybrid model). This hybrid model aims to provide a 
good approach to experimental data. Any hybrid 
model based on the MLP model is used when the 
modeling of experimental data by the MLR model 
and MLP model could not provide good approaches 
to experimental values. The use of this model 
amounts first to adjusting the values of the 
dependent variable(s) according to the values of the 
relevant independent variables in their explanation 
by the MLR model; then to use the values of the 
dependent variable(s) adjusted by the MLR model as 
output and the independent variables as input 
parameters for the MLP model. Many studies have 
highlighted the accuracy of this hybrid model in 
ecological modeling. That is the particular case 
reported by Adnan et al. [18], Kamisan et al. [19], 
Lola et al. [20], Massouri et al. [21] and Yao et al. 
[22-24]. 

The Déganobo lake system, located in the urban 
center of San-Pedro city, is one of the tourist 
attractions of this seaside town [25]. It has a 
remarkable biodiversity [26]. However, it is 
currently the receptacle of anthropogenic discharges 
of all kinds without treatment from its watershed. 
This fact leads to its relatively high pollution. 
Indeed, Konan and Yao [27] have highlighted the 
high organic pollution of its waters with serious 
ecological risks during all seasons. The seasonal 
mean values of their COD were higher than 220 mg 
O2/L, with the annual mean value of their COD of 
296.05 mg O2/L from August 2021 to July 2022. So, 
it is important to carry out actions and decisions for 
the protection and sustainable development of this 
lake system. The knowledge of the static and 
dynamic evolution of their organic pollution in short 
or/and long times could contribute to it. This study 
aimed to study the modeling of the COD of its 
waters by three models: MLP, MLR and hybrid 
model MLR/MLP hybrid model. 


Materials and Methods 


Presentation of the study area 


The Déganobo Lake system is geolocated at 6.63775 
W and 4.75046 N. It consists of two lakes: Lac Ouest 
with a currently open water surface area of 49.05 ha 
and Lac Est with a currently open water surface area 
of 28.87 ha [27, 28] (Fig 1). It has impressive 
hydrology, made of the San-Pédro River and the 
Digboué lagoon, linked between them by a lot of 
wetlands [26, 27, 29]. Its hydrochemistry is linked 
to the rainfall in the San-Pédro Department [27]. 
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Fig. 1 Geo-localization of the Déganobo Lake system (Map source: Traoré [25], cited by Konan and Yao [27]). 


This lake system is under strong anthropogenic 
pressures, because receives all kinds of discharges 
from its watershed. Thus, this situation leads to its 
relatively important pollution as revealed by AIP 
[26], Konan and Yao [27] and Ogou and Bidi [30]. 


Data collection 


The monthly values of pH, conductivity, temperature, 
redox potential and COD of these waters used in this 
study were obtained from the works of Konan and 
Yao [27] during the period from August 2021 to July 
2022. Those of the monthly rainfall in the same 
period were downloaded from the website 
“historiqueméteo.net” [31, 32]. 


Implementation of the MLR model, MLP model 
and MLR/MLP hybrid model 


The implementation of these different models was 
done as same as done by Yao et al. [23, 24]. In the 
development of these models, pH, redox potential, 
conductivity and temperature of the waters from the 
Déganobo lake system, as well as the rainfall in the 
San-Pédro Department were taken into account, 
because playing important roles in the dynamics of 
the COD of these waters from August 2021 to July 


2022 as highlighted by Konan and Yao [27]. 


Multiple linear regression model (MLR model) 


In this study, the development of the MLR model was 
done considering the monthly COD (COD) of these 
waters as the dependent variable; the monthly pH 
(pH), conductivity (Cond), temperature (T) and redox 
potential (U) of these waters, as well as the monthly 
rainfall in the San-Pédro Department, as independent 
variables. The MLR model was performed in this 
study using the IBM SPSS statistics V20 software. A 
dataset of 1048 data was used for this purpose. All 
calculations were performed in double precision. The 
model obtained in this study was validated if these 
two conditions are simultaneously observed: the 
determination coefficient of the MLR model (R?mir) 
obtained is greater than 0.5, i.e., the MLR model 
expresses more than 50% of the experimental values 
of the COD of these waters and the p-value is less 
than 0.05 (5%). 


Multilayer perceptron model (MLP model) 


Two approaches were considered in the 
implementation of the MLP models: the static 
modeling and the dynamic modeling of the COD of 
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Fig. 2 Architecture of the MLR/6-3-1 hybrid model obtained in the case of the static modeling of the chemical oxygen demand of 


the waters from the Déganobo Lake system. 


Biaser, biais; Month, time; pH, pH of these waters; U, redox potential of these waters; Sal, salinity of these waters; Cond, 
conductivity of these waters; T, temperature of these waters; Rain, monthly rainfall; CODmlr, chemical oxygen demand of lake 
waters obtained with MLR model; blue lines show positive synaptic weight and grey lines show negative synaptic weight. 


Biaiser 


copmir 


Fig. 3 Architecture of the MLR/7-3-1 hybrid model obtained in the case of the dynamic modeling of the chemical oxygen demand 


of the waters from the Déganobo Lake system. 


Biaser, biais; Month, time; pH, monthly pH of these waters; U, monthly redox potential of these waters; Sal, monthly salinity of 
these waters; Cond, monthly conductivity of these waters; T, monthly temperature of these waters; Rain, monthly rainfall; CODmIr, 
monthly chemical oxygen demand of lake waters obtained with MLR model; blue lines shows positive synaptic weight and grey 


lines show negative synaptic weight. 


these waters. For the static modeling of the COD of 
these waters: the monthly COD of these waters was 
the output parameter and the monthly pH (pH), 
conductivity (Cond), temperature (T) and redox 
potential (U) of these waters, as well as the monthly 
rainfall in the San-Pédro Department, were the input 


parameters. For the dynamic modelling of the COD 
of these waters, the time (month) was added to the 
input parameters considering the case of the static 
modeling. The different MLP models were performed 
using the IBM SPSS statistics V20 software. A 
dataset of 1048 data was used for the static modeling 
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of the COD of these waters and that of 1060 data was 
for the dynamic modeling of their COD. In the 
development of the MLP models, the different 
datasets were partitioned into three: 40% for the 
learning phase, 30% for the validation phase and 30% 
for the test phase. The number of hidden layers was 
1. The transfer function used on the hidden layer 
neurons is sigmoid (Tanh) and the function used on 
the output layer neuron is the identity function (y = 
x). Before processing, the different values of the input 
and output parameters were normalized according to 
equation (1) and coded in a range between 0 and n (n 
is an entire number). 


2(X; — Xmi 

Xni = ( l min) = 1 (1) 

(Xmax z Xmin) 
The weights of the network are initialized before their 
variation in the learning phase to obtain a low error. 
The Levenberg-Marquardt algorithm was used to 
speed up the learning phase. The learning rate was 
initially set to 0.4 and gradually decreased to 0.001 at 
+ 0.5 steps. The network architecture was optimized 
by the trial-and-error method. The number of hidden 
layers varied from 1 to 10. For each value of the 
hidden layer, the simulations were performed 2000 
times and the best result (simultaneous highest values 
of the determination coefficients in the learning phase 
(R*iearning) and in the test phase (R7test) of the 
corresponding network architecture was recorded. 
The best model for each case of the COD modeling 
of these waters was the model that presented the 
highest value of the determination coefficient (R7), 
which was the mean of the determination coefficient 
obtained in the learning phase (R7tcaming) and that 
determined in the test phase (Rtest) equation (2): 


Rfearnin + Rf 
g est 
i a 2 


This model is validated if these two following 
conditions are observed: Rtest is higher than 0.5, i.e., 
the model expresses more than 50% of the 
experimental values of the COD of these waters in the 
test phase, and RMSEees: (root mean square error in 
the test phase) and RE est (relative error in test phase) 
are very low, the lowest of all of the different MLP 
models established. 


Multiple linear regression-multilayer perceptron 
hybrid model (MLR/MLP hybrid model) 


Two approaches were also made with this hybrid 
model: the static modeling and the dynamic modeling 
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of the COD of these waters. In the implementation of 
this model, the COD of these waters was before 
modeling by the MLR model in the same conditions 
as II-2-3. This step was followed by the static 
modeling and the dynamic modeling of the calculated 
values of the DCO of these waters (obtained by the 
equation established by the MLR model) by the MLP 
model with the same input parameters as done in the 
case of the implementation of the MLP models. The 
choice of the best MLR/MLP hybrid model in each 
case of the modeling of the COD of these waters and 
their validation conditions were the same as in the 
case of MLP models. 


Results 
Multiple linear regression model (MLR model) 


The MLR model predicts 36.2% of the experimental 
values of the COD of these waters (Table 1). 
Considering its R’mrr less than 0.9 and its p-value 
superior to 0.05 (Table 2), this model is not accurate 
for this purpose in this study. So, there is no good 
linearity between the COD of these waters and the 
independent variables used in this case. 


Table 1 Some statistical parameters of the MLR model 
obtained in this study. 


Rar RMR R2MLR adjusted p-value 
0.190 0.362 - 0.907 


Table 2 Coefficient and p-value of each parameter obtained 
with the MLR model. 


Parameters Coefficient p-value 

Ordinate origin 784.4773 0.112085 
pH 6.7442 0.894993 
Potential redox -0.0415 0.957740 
Salinity -44.3410 0.835169 
Conductivity 0.1078 0.803677 
Temperature -18.1034 0.246282 
Rainfall -0.0615 0.773322 


Table 3: Statistical parameters for MLP models obtained in 
the case of the static modeling of the chemical oxygen demand 
of the waters from the Déganobo Lake system. 


MLP model R earning Rrest R? RMSE est RE est 


6-1-1 0.0013 0.016 0.0087 1.5192 1.1430 
6-2-1 0.1502 0.0093 0.0798 2.5132 0.9330 
6-3-1 0.0074 0.0106 0.0090 1.2124 1.8380 
6-4-1 0.0267 0.0192 0.0230 2.1610 1.0120 
6-5-1 0.031 0.001 0.0160 2.8609 0.8630 
6-6-1 0.0875 0.1670 0.1273 0.9644 0.9740 
6-7-1 0.1152 0.3975 0.2564 2.2338 0.8650 
6-8-1 0.3824 0.3088 0.3456 1.8942 0.7590 
6-9-1 0.0294 0.0000 0.0147 2.5558 1.0570 
6-10-1 0.3681 0.0633 0.2157 0.6595 0.7040 
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Table 4 Statistical parameters for MLP models obtained in the 
case of the dynamic modeling of the chemical oxygen demand 
of the waters from the Déganobo lake system. 


MLP model R’rearning Rreest R? RMSE sest RE est 
7-1-1 0.0666 0.0349 0.0508 0.9648 1.0367 
7-2-1 0.0008 0.0017 0.0013 3.2334 1.0470 
7-3-1 0.3363 0.0612 0.1988 1.3038 0.8990 
7-4-1 0.0030 0.0877 0.0454 2.0030 0.9100 
7-5-1 0.0900 0.0207 0.0554 1.1100 1.3760 
7-6-1 0.0631 0.0039 0.0335 0.4806 1.0110 
7-8-1 0.5336 0.0508 0.2922 2.3994 1.0620 
7-8-1 0.0000 0.0190 0.0095 0.8803 1.3610 
7-9-1 0.1390 0.1559 0.1475 1.6559 0.9050 
7-10-1 0.0451 0.0004 0.0223 2.6216 1.1830 
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Fig. 4 Representation of experimental values of the chemical 
oxygen demand of lake waters as a function of their predicted 
values by the MLR/6-3-1 in the test phase. 
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Fig. 5 Representation of experimental values of the chemical 
oxygen demand of lake waters as a function of their predicted 
values by the MLR/7-3-1 in the test phase. 
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Multilayer perceptron model (MLP model) 


The statistical parameters (R7leaming, R7test, RMSEtest 
and REtest) of the MLP models obtained in this study 
for the static modeling and the dynamic modeling of 
the COD of these waters are presented in Tables 3 and 
4. The best MLP model for the static modeling of the 
COD of these waters is 6-8-1. This model expresses 
at 30.88 % the experimental values of COD in the test 
phase, less than 50%, with the relatively high 
RMSEtes: and REtest. The best model for the dynamic 
modeling of their COD is 7-3-1. This model expresses 
at 6.12 % the experimental values of their COD in the 
test phase, less than 50%, again with the relatively 
high RMSEeest and REtes. So, these two models are 
not suitable for the prediction of the static and 
dynamic evolutions of the COD of the waters of this 
lake system in this study, according to the conditions 
defined. 


Multiple linear regression-multilayer perceptron 
hybrid model (MLR/MLP hybrid model) 


The best MLR/MLP hybrid models for the static 
modeling and the dynamic modeling of the COD of 
these waters are respectively MLR/6-3-1 and MLR/7- 
3-1. The MLR/6-3-1 expresses 99.50% of the 
experimental values of the COD of these waters 
during the test phase, while the MLR/7-3-1 does it at 
99.85 %. These two hybrid models have relatively 
very low RMCEeest and REtest, the lowest of all of the 
different hybrid models (Table 5 and 6). So, the two 
models, validated according to the conditions 
defined, are more accurate for the prediction of the 
COD of these waters. The architectures of these two 
models are given in the Fig. 2 and 3, respectively. The 
representations of the experimental values of the 
COD of these waters in function to those predicted by 
these models are presented in Fig. 4 and 5, 
respectively. 


Discussion 


In this study, the poor predictions of the COD of the 
waters of this lake system by the MLR model and 
MLP model according to their salinity, redox 
potential, pH, conductivity, as well as the rainfall in 
its watershed, would highlight the complexity of the 
biogeochemical reactions within this lake system. 
Indeed, these physical, chemical and hydrological 
parameters play important roles in the fate of organic 
matter in surface waters and, consequently, in the 
dynamic of their organic pollution [33-36]. This 
seems to be particularly shown for the waters of this 
aquatic ecosystem during the long dry season, where 
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Konan and Yao [27] noted significant correlations 
between their COD and their pH, salinity and rainfall 
in its watershed with the Principal Components 
Analysis (PCA). The strong anthropogenic pressures 
on the watershed of this aquatic ecosystem, leading to 
its serious ecological degradation [26, 27, 30], would 
therefore have qualified the relevance of these 
parameters on the dynamic of the COD of its waters 
over the entire study period of Konan and Yao [27]. 
This fact is common in natural waters, especially 
polluted ones, where the direct interactions between 
the different forms of pollution with all the 
biogeochemical and physical parameters playing 
important roles in them are difficult to highlight in 
most cases [37, 38]. 


Table 5 Statistical parameters for MLR/MLP hybrid models 
obtained in the case of the static modeling of the chemical 
oxygen demand of the waters from the Déganobo lake system. 


Hybrid model Rcarsing Retest R? RMSEeest RE test 
6-1-1 0.9885 0.9827 0.9856 0.1225 0.005 
6-2-1 0.9748 0.9649 0.9699 0.2588 0.038 
6-3-1 0.9970 0.9930 0.9950 0.0949 0.006 
6-4-1 0.9946 0.9943 0.9945 0.1581 0.009 
6-5-1 0.9498 0.9121 0.9310 0.4733 0.123 
6-6-1 0.9859 0.9440 0.9650 0.4025 0.043 
6-7-1 0.9925 0.9696 0.9811 0.4025 0.042 
6-8-1 0.9850 0.9735 0.9793 0.4231 0.060 
6-9-1 0.9933 0.9635 0.9784 0.1342 0.008 
6-10-1 0.9842 0.9807 0.9825 0.2898 0.048 


Table 6 Statistical parameters for MLR/MLP hybrid models 
obtained in the case of the dynamic modeling of the chemical 
oxygen demand of the waters from the Déganobo lake system. 


Hybrid model R2rearning —_ Rest R? RMSE est RE test 
7-1-1 0.9936 0.9933 0.9935 0.1304 0.007 
7-2-1 0.9881 0.9907 0.9894 0.1049 0.007 
7-3-1 0.9979 0.9985 0.9982 0.1000 0.002 
7-4-1 0.9989 0.9921 0.9955 0.2214 0.013 
7-5-1 0.9831 0.9893 0.9862 0.1949 0.016 
7-6-1 0.9755 0.9720 0.9738 0.3131 0.027 
7-7-1 0.9762 0.9603 0.9683 0.3661 0.044 
7-8-1 0.9666 0.9436 0.9551 0.1789 0.082 
7-9-1 0.9769 0.9687 0.9728 0.4324 0.059 
7-10-1 0.9918 0.9848 0.9883 0.3507 0.045 


The acuity of the MLR/MLP hybrid model in 
predicting ecological phenomena [18-24] is once 
again highlighted in this study, where this model 
predicts more than 99% of the COD of the waters of 
this lake system, and that with very low errors. This 
fact could be explained by the partial linearity 
previously introduced by the MLR model between the 
COD of these waters and the independent variables 
used for this purpose. This has the effect of further 
revealing the relevance of these independent 
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variables used as input parameters for the MLP model 
on the one hand, and the better prediction of the 
independent variable through that predicted by the 
MLR model, used as parameter output, on the other. 
Indeed, more there are high correlations between the 
input parameters and the output parameter(s), the 
better the results obtained with the MLP model [22]. 
That could explain the high accuracy of the 
MLR/MLP hybrid model for the prediction of the 
COD ofthese waters relatively of the MLR model and 
the MLP model in this study. This was also noticed 
by Yao et al. [24] in the case of the modeling of the 
COD ofthe waters from the Tiagba Lagoon Bay. The 
ability of the “MLR/MLP” hybrid model more than 
the MLR model and the MLP model was reported by 
many studies in other cases, such as those of Kamisan 
et al. [19] in the modeling of the load forecasting of 
Malaysian City, Lola et al. [20] in the modeling of the 
chlorophyll-a of the waters from the Offshore Kuala 
Terengganu, Manssouri et al. [21] in the modeling of 
the water quality indicators of groundwater and, Yao 
et al. [22] in the modeling ofthe eutrophication of the 
waters from the Tiagba lagoon bay. On the whole, the 
hybrid models based on the MLP model have very 
good accuracy, as reported by many recent studies, 
including those of Li et al. [39], He et al. [40] and Zhu 
et al. [41]. 


Conclusion 


The use of different models in this context has once 
again highlighted the acuity of the MLR-MLP hybrid 
model in translating environmental phenomena, 
especially those related to the pollution of surface 
waters. The MLR-MLP hybrid models obtained in 
this study could serve as a basis for any decision 
concerning the rehabilitation and protection of this 
aquatic ecosystem. Other studies concerning the 
modeling of chemical pollution of the waters of this 
lake system by this hybrid model, especially those 
related to their pollution pesticides and aromatic 
polycyclic aromatic, should be explored for the same 
purposes. 
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